-
Notifications
You must be signed in to change notification settings - Fork 53
Add orchestration execution tracing #441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add orchestration execution tracing #441
Conversation
sophiatev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity: do we plan to do the same thing for entities at some point?
| completionToken: string.Empty, /* doesn't apply */ | ||
| entityConversionState: null); | ||
| entityConversionState: null, | ||
| // TODO: Should this activity be created? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might lead to redundancy in the case that this is being used by Durable Functions (since DT.Core will also create the trace activity in that case, I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was primarily targeting these changes at DT used separately from DF. I will see if I can test the two together. I would expect that DF uses a differently-named activity source, though, so overlap should be ok. I wouldn't want to not trace something in DT because it's being traced in DF.
test/Grpc.IntegrationTests/GrpcSidecar/Dispatcher/GrpcSubOrchestrationInstanceCreatedEvent.cs
Outdated
Show resolved
Hide resolved
test/Grpc.IntegrationTests/GrpcSidecar/Dispatcher/TaskOrchestrationDispatcher.cs
Outdated
Show resolved
Hide resolved
src/Shared/Grpc/ProtoUtils.cs
Outdated
| InstanceId = subOrchestrationAction.InstanceId, | ||
| Name = subOrchestrationAction.Name, | ||
| Version = subOrchestrationAction.Version, | ||
| ParentTraceContext = clientActivityContext is not null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having to creating a whole TraceContext object and having to create these strings manually, can you just save the span ID? Since it looks like that is the only information TraceHelper.CreateTraceActivityForSchedulingSubOrchestration and TraceHelper.StartTraceActivityForSchedulingTask actually use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I misunderstand, while it's true that TraceHelper only use a subset of TraceContext, the protos for those history events themselves offer a parent TraceContext so this is necessary to satisfy that original contract (e.g. across all platforms). Currently, for example, DTS is passing the original parent context at orchestration creation to those events, which is be better than nothing, but results in gaps in the trace. Allowing for the context to round trip gives the SDKs flexibility in adding layers to the trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I did do some cleanup to simplify/consolidate creation of these context objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see what you mean, it just makes me quite nervous that we're having to build these manually, and that we're altering the parent trace context to now hold an injected client span id rather than the parent trace activity's span id. An alternative approach is to add an additional field to the proto object corresponding to the clientSpanId, and just set the TraceParent as Activity.Id as you normally would. If anything, I'd say that this is truer to the original contract: other SDKs accessing this object would expect it hold the span id of the parent trace Activity, not an injected client span id.
All that being said, I do see that DT.Core builds a traceparent manually in an identical way so maybe it's not a cause for concern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's where I got it from, so I figured it ought to work at least as well here and I was trying to minimize the changes needed to the protos. That said, I'm also not crazy about having to build the values by hand.
My first step was just to reach parity with DTFx but I think we'd definitely want to extended to entities/schedules. |
cgillum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nit-picks as I go through the PR. I have not yet reviewed everything.
| return null; | ||
| } | ||
|
|
||
| if (startEvent.ParentTraceContext is null || !ActivityContext.TryParse(startEvent.ParentTraceContext.TraceParent, startEvent.ParentTraceContext.TraceState, out ActivityContext activityContext)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you wrap these long lines? Per the .editorconfig file (which not all IDEs support natively, though VSCode.dev seems to), we try to keep line length to a maximum of 120 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've gotten all of these.
| /// <remarks> | ||
| /// Adapted from "https://github.com/Azure/durabletask/blob/main/src/DurableTask.Core/Tracing/TraceHelper.cs". | ||
| /// </remarks> | ||
| class TraceHelper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make this static class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| /// <summary> | ||
| /// Constants for trace activity names used in Durable Task Framework. | ||
| /// </summary> | ||
| class TraceActivityConstants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make this static class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| } | ||
| } | ||
|
|
||
| if (Activity.Current is not null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rather than checking separately for Activity.Current being null, could we just move this logic into the previous newActivity != null code block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this makes sense.
| { | ||
| P.HistoryEvent? GetSuborchestrationInstanceCreatedEvent(int eventId) | ||
| { | ||
| var subOrchestrationEvent = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: per the existing conventions in this repo, please try to use explicit type names instead of var. It makes the code easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've updated all of these.
|
@sophiatev Is there more feedback on this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but just a few questions:
- How did we actually get this working? When I was trying to do add distributed tracing for entities I had to do a lot of work-arounds and add most of it to DT.Core because we seemingly couldn't add an Activity source to the dotnet SDK at the time. Did that change? (I wish I would have done my work after you did then :D. It would have improved my implementation by quite a bit)
- Does the user have to enable distributed tracing in their settings somehow to get this tracing?
- Do we want to make an Activity in the
GrpcOrchestrationRunnerafter all?
@sophiatev I got it working/tested by creating an app that references a private build of the SDK and enables this new activity source (
Yes, users collecting traces would have to add
Are you referring to the earlier comment about Durable Functions also having an activity for that method, or something else? If the former, I'd say yes; the DF traces are from a different source so we'd want to ensure it's logged when DT is used on its own. @sophiatev With tentative approval here, we can now commit the corresponding protobuf changes. That will then mean a small update to this PR to bring them across the "proper way", which will need a re-approval. |
Perhaps a silly question, but have you tested without doing this to make sure that nothing breaks? I made the mistake of not doing so (in my case, not trying a run without distributed tracing enabled) which ended up in a null exception for customers. Want to make sure other's don't repeat my mistakes :D
Yes, I am. But it looks like the |
@sophiatev That's a good test case; yes, I've verified that execution proceeds when tracing has not been enabled.
I verified that |
|
@sophiatev Is there a regular schedule/process for new releases? |
Ports Durable Task Framework (DTFx) tracing behavior to the Durable Task .NET SDK gRPC layer to allow end-to-end tracing of orchestrations from creation to completion, including:
With this change, users can get a complete picture of orchestration execution including how their own traces dovetail with Durable Task execution, as shown in the following example, which follows creation of an orchestration starting from a web API, to execution by a DT worker, with suborchestrations, and back to the web API as part of task execution.
This change depends on corresponding changes to the Durable Task protobuf definitions. It also assumes any backend properly accepts and returns the parent trace contexts given it by the SDK.
Resolves #439